Open-retrieval conversational machine reading comprehension (OCMRC) simulates real-life conversational interaction scenes. Machines are required to make a decision of "Yes/No/Inquire" or generate a follow-up question when the decision is "Inquire" based on retrieved rule texts, user scenario, user question, and dialogue history. Recent studies explored the methods to reduce the information gap between decision-making and question generation and thus improve the performance of generation. However, the information gap still exists because these pipeline structures are still limited in decision-making, span extraction, and question rephrasing three stages. Decision-making and generation are reasoning separately, and the entailment reasoning utilized in decision-making is hard to share through all stages. To tackle the above problem, we proposed a novel one-stage end-to-end framework, called Entailment Fused-T5 (EFT), to bridge the information gap between decision-making and generation in a global understanding manner. The extensive experimental results demonstrate that our proposed framework achieves new state-of-the-art performance on the OR-ShARC benchmark.
translated by 谷歌翻译
Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing. However, maximization-based decoding methods (e.g., greedy/beam search) often lead to the degeneration problem, i.e., the generated text is unnatural and contains undesirable repetitions. Existing solutions to this problem either introduce randomness prone to incoherence or require a look-ahead mechanism that demands extra computational overhead. In this study, we formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph. Thereby, we understand the phenomenon of degeneration as circular loops within the directed graph. Based on our formulation, we propose a novel decoding method -- \textit{momentum decoding} -- which encourages the LM to \textit{greedily} explore new nodes outside the current graph. Meanwhile, it also allows the LM to return to the existing nodes with a momentum downgraded by a pre-defined resistance function. We extensively test our approach on three benchmarks from different domains through automatic and human evaluations. The results show that momentum decoding performs comparably with the current state of the art while enjoying notably improved inference speed and computation FLOPs. Furthermore, we conduct a detailed analysis to reveal the merits and inner workings of our approach. Our codes and other related resources are publicly available at https://github.com/gmftbyGMFTBY/MomentumDecoding.
translated by 谷歌翻译
Prompt learning recently become an effective linguistic tool to motivate the PLMs' knowledge on few-shot-setting tasks. However, studies have shown the lack of robustness still exists in prompt learning, since suitable initialization of continuous prompt and expert-first manual prompt are essential in fine-tuning process. What is more, human also utilize their comparative ability to motivate their existing knowledge for distinguishing different examples. Motivated by this, we explore how to use contrastive samples to strengthen prompt learning. In detail, we first propose our model ConsPrompt combining with prompt encoding network, contrastive sampling module, and contrastive scoring module. Subsequently, two sampling strategies, similarity-based and label-based strategies, are introduced to realize differential contrastive learning. The effectiveness of proposed ConsPrompt is demonstrated in five different few-shot learning tasks and shown the similarity-based sampling strategy is more effective than label-based in combining contrastive learning. Our results also exhibits the state-of-the-art performance and robustness in different few-shot settings, which proves that the ConsPrompt could be assumed as a better knowledge probe to motivate PLMs.
translated by 谷歌翻译
对话机阅读理解(CMRC)旨在帮助计算机理解自然语言文本,然后进行多转交谈以回答与文本有关的问题。现有方法通常需要三个步骤:(1)基于需要推理的决策; (2)如果上述决定的要求,请跨越提取; (3)基于提取的跨度重新绘制问题。但是,对于几乎所有这些方法,跨度提取和问题的改写步骤无法完全利用决策制定步骤中的细粒度构成推理信息,因为它们的相对独立性将进一步扩大决策制定和问题措辞之间的信息差距。因此,为了解决这个问题,我们提出了一个基于共享参数机制的对话机读取理解理解的新颖端到端框架,称为Intailment推理T5(ET5)。尽管我们提出的框架轻量级,但实验结果表明,拟议的ET5以55.2的BLEU-4分数在Sharc排行榜上取得了新的最新结果。我们的模型和代码可在https://github.com/yottaxx/et5上公开获取。
translated by 谷歌翻译
最近,为了提高无监督的图像检索性能,通过设计语义相似性矩阵提出了许多无监督的哈希方法,该方法基于预先训练的CNN模型提取的图像功能之间的相似性。但是,这些方法中的大多数倾向于忽略图像中包含的高级抽象语义概念。直观地,概念在计算图像之间的相似性中起着重要作用。在实际情况下,每个图像都与某些概念相关联,如果两个图像共享更相同的概念,则两个图像之间的相似性将更大。受到上述直觉的启发,在这项工作中,我们提出了一种带有语义概念挖掘的新颖无监督的散列散布,称为UHSCM,该挖掘利用VLP模型来构建高质量的相似性矩阵。具体而言,首先收集一组随机选择的概念。然后,通过使用及时的工程进行视觉预审进(VLP)模型,该模型在视觉表示学习中表现出强大的力量,根据训练图像将一组概念降低。接下来,提出的方法UHSCM应用了VLP模型,并再次提示挖掘每个图像的概念分布,并基于挖掘的概念分布构建高质量的语义相似性矩阵。最后,以语义相似性矩阵作为指导信息,提出了一种新颖的散列损失,并提出了基于对比度损失的正则化项,以优化哈希网络。在三个基准数据集上进行的大量实验表明,所提出的方法在图像检索任务中优于最新基准。
translated by 谷歌翻译
无监督的问题回答是一项有吸引力的任务,因为它在标签数据上的独立性。以前的作品通常使用启发式规则以及预先培训的模型来构建数据和训练QA模型。但是,这些作品中的大多数都将命名为实体(NE)视为唯一的答案类型,它忽略了现实世界中答案的高度多样性。为了解决这个问题,我们提出了一种新颖的无监督方法,通过多样化的答案,名为Diverseqa。具体而言,所提出的方法由三个模块组成:数据构建,数据增强和降解过滤器。首先,数据构建模块将提取的命名实体扩展到一个较长的句子成分,作为构建具有不同答案的质量检查数据集的新答案跨度。其次,数据增强模块通过嵌入级别的对抗训练采用答案型依赖性数据增强过程。第三,denoising滤波器模块旨在减轻构造数据中的噪声。广泛的实验表明,所提出的方法在五个基准数据集上优于先前的无监督模型,包括squeadv1.1,newsqa,triviaqa,bioasq和Duorc。此外,提出的方法在少量学习设置中显示出强劲的性能。
translated by 谷歌翻译
印尼语是一种凝结的语言,因为它具有复杂的单词形成过程。因此,该语言的翻译模型需要一种甚至低于单词级别的机制,称为子字级别。自词汇量爆炸以来,这种复合过程导致了一个罕见的单词问题。我们提出了一种解决神经机器翻译(NMT)系统的唯一单词问题的策略,该系统将印度尼西亚语用作一对语言。我们的方法使用基于规则的方法将单词转换为其根部并伴随词缀以保留其含义和上下文。使用基于规则的算法具有更多优势:它不需要语料库数据,而仅应用标准的印尼规则。我们的实验证实了这种方法是实用的。它将词汇的数量大大减少到57%,在英语到印度尼西亚翻译上,此策略在不使用此技术的类似NMT系统上提供了多达5个BLEU点的改进。
translated by 谷歌翻译
Twitter包含来自现实世界中的大量语言数据。我们检查了Twitter的低资源语言(例如本地印尼语)的用户生成的内容。为了使NLP在印尼语中工作,它必须考虑本地方言,地理环境和区域文化影响印尼语言。本文确定了我们在构建本地印尼NLP数据集时面临的问题。此外,我们正在开发一个用于创建,收集和分类NLP本地印尼数据集的框架。使用Twitter的地理位置工具自动注释。
translated by 谷歌翻译
传统中药(TCM)是一种自然,安全且有效的疗法,已在全球范围内传播和应用。独特的TCM诊断和治疗系统需要对隐藏在自由文本编写的临床记录中的患者症状进行全面分析。先前的研究表明,该系统可以在人工智能(AI)技术(例如自然语言处理(NLP))的帮助下进行通知和智能。但是,现有数据集没有足够的质量或数量来支持TCM中数据驱动的AI技术的进一步开发。因此,在本文中,我们专注于TCM诊断和治疗系统的核心任务 - 综合征分化(SD) - 我们介绍了第一个用于SD的公共大型数据集,称为TCM-SD。我们的数据集包含54,152个现实世界临床记录,涵盖148个综合征。此外,我们在TCM领域收集了一个大规模的未标记文本语料库,并提出了一种特定领域的预训练的语言模型,称为Zy-Bert。我们使用深层神经网络进行了实验,以建立强大的性能基线,揭示了SD中的各种挑战,并证明了特定领域的预训练性语言模型的潜力。我们的研究和分析揭示了将计算机科学和语言学知识纳入探索TCM理论的经验有效性的机会。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译